A Framework for Mdl Clustering
نویسنده
چکیده
Data clustering is one of the central concepts in the field of unsupervised data analysis and machine learning, but it is also a surprisingly controversial issue, and the very meaning of the concept “clustering” may vary a great deal between different scientific disciplines (see, e.g., [1] and the references therein). However, a common goal in all cases is that the objective is to find a structural representation of data by grouping (in some sense) similar data items together. In our work we have focused on non-hierarchical (flat) clustering, where clustering is regarded as a partitional data assignment or data labeling problem, and the goal is to partition the data into mutually exclusive clusters so that similar (in a sense that needs to be defined) data items are grouped together. The number of clusters is unknown, and determining the optimal number is part of the clustering problem. The data are assumed to be in a vector form so that each data item is a vector consisting of a fixed number of attribute values. We can now identify two fundamental problems within this framework:
منابع مشابه
Robust Information Clustering on MDI
We propose a robust framework for determining a natural clustering of a given dataset, based on the minimum description length (MDL) principle. The proposed framework, robust informationtheoretic clustering (RIC), is orthogonal to any known clustering algorithm, Given a preliminary clustering, RIC purifies these clusters from noise, and adjusts the clustering’s such that it simultaneously deter...
متن کاملComputationally Efficient Methods for MDL-Optimal Density Estimation and Data Clustering
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. T...
متن کاملMDL Histogram Density Estimation
We regard histogram density estimation as a model selection problem. Our approach is based on the information-theoretic minimum description length (MDL) principle, which can be applied for tasks such as data clustering, density estimation, image denoising and model selection in general. MDLbased model selection is formalized via the normalized maximum likelihood (NML) distribution, which has se...
متن کاملMDL-Based Unsupervised Attribute Ranking
In the present paper we propose an unsupervised attribute ranking method based on evaluating the quality of clustering that each attribute produces by partitioning the data into subsets according to its values. We use the Minimum Description Length (MDL) principle to evaluate the quality of clustering and describe an algorithm for attribute ranking and a related clustering algorithm. Both algor...
متن کاملRobust growing neural gas algorithm with application in cluster analysis
We propose a novel robust clustering algorithm within the Growing Neural Gas (GNG) framework, called Robust Growing Neural Gas (RGNG) network.The Matlab codes are available from . By incorporating several robust strategies, such as outlier resistant scheme, adaptive modulation of learning rates and cluster repulsion method into the traditional GNG framework, the proposed RGNG network possesses ...
متن کامل